A comparison of grapheme and phoneme-based units for Spanish spoken term detection
نویسندگان
چکیده
The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech and used to find matches. Lattice search (referred to as spoken term detection), uses a pre-indexing of speech data in terms of word or sub-word units, which can then quickly be searched for arbitrary terms without referring to the original audio. In both cases, the search term can be modelled in terms of sub-word units, typically phonemes. For in-vocabulary words (i.e. words that appear in the pronunciation dictionary), the letter-to-sound conversion systems are accepted to work well. However, for out-ofvocabulary (OOV) search terms, letter-to-sound conversion must be used to generate a pronunciation for the search term. This is usually a hard decision (i.e. not probabilistic and with no possibility of backtracking), and errors introduced at this step are difficult to recover from. We therefore propose the direct use of graphemes (i.e., letter-based sub-word units) for acoustic modelling. This is expected to work particularly well in languages such as Spanish, where despite the letter-to-sound mapping being very regular, the correspondence is not one-to-one, and there will be benefits from avoiding hard decisions at early stages of processing. In this article, we compare three approaches for Spanish keyword spotting or spoken term detection, and within each of these we compare acoustic modelling based on phone and grapheme units. Experiments were performed using the Spanish geographical-domain ALBAYZIN corpus. Results achieved in the two approaches proposed for spoken term detection show us that trigrapheme units for acoustic modelling match or exceed the performance of phone-based acoustic models. In the method proposed for keyword spotting, the results achieved with each acoustic model are very similar. 2008 Elsevier B.V. All rights reserved.
منابع مشابه
Grapheme based speech recognition
Large vocabulary speech recognition systems traditionally represent words in terms of subword units, usually phonemes. This paper investigates the potential of graphemes acting as subunits. In order to develop context dependent grapheme based speech recognizers several decision tree based clustering procedures are performed and compared to each other. Grapheme based speech recognizers in three ...
متن کاملGraphemes as basic units for crosslingual speech recognition
This paper presents our work on grapheme based crosslingual speech recognition carried out within the MASPER initiative. The performance of monolingual grapheme based acoustic models is compared to the performance of monolingual acoustic models based on phonemes. The transfer between source and target language was done using an expert knowledge approach. For the experiments, German, Spanish, Hu...
متن کاملSpanish dialects: phonetic transcription
It is well known that canonical Spanish, the dialectal variant ‘central’ of Spain, so called Castilian, can be transcribed by rules. This paper deals with the automatic grapheme to phoneme transcription rules in several Spanish dialects from Latin America. Spanish is a language spoken by more than 300 million people, has an important geographical dispersion compared among other languages and ha...
متن کاملHybrid Grapheme to Phoneme Conversion forUnlimited
Both dictionary-based and rule-based methods on grapheme-to-phoneme conversion have their own advantages and limitations. For example, a large sized phonetic dictionary and complex morphophonemic rules are required for the dictionary-based method and the LTS(letter to sound) rule-based method itself cannot model the complete morphophonemic constraints. This paper describes a grapheme-to-phoneme...
متن کاملImproving grapheme-based ASR by probabilistic lexical modeling approach
There is growing interest in using graphemes as subword units, especially in the context of the rapid development of hidden Markov model (HMM) based automatic speech recognition (ASR) system, as it eliminates the need to build a phoneme pronunciation lexicon. However, directly modeling the relationship between acoustic feature observations and grapheme states may not be always trivial. It usual...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Speech Communication
دوره 50 شماره
صفحات -
تاریخ انتشار 2008